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ABSTRACT 

In  formulating  models  for  a  complex  system  graphical  representation  is  an  effective 
tool.  When  the  components  of  the  system  are  viewed  as  random  variables,  directed 
graphical  models  detail  the  nature  of  the  dependence  among  them.  Moreover,  if  for  each 
variable  the  conditional  distribution  is  provided  according  to  the  graph,  the  joint 
distribution  is  uniquely  determined.  Natural  questions  arise  about  the  static  behavior  of 
the  system  under  such  specification  as  well  as  its  response  to  information  (observed  levels 
of  some  of  the  variables).  Answers  to  these  questions  require  the  ability  to  calculate 
arbitrary  marginal  and  conditional  distributions.  In  complex  cases  (high  dimensional 
structures)  such  calculations  require  high  dimensional  integrations  and/or  summations. 
Most  of  the  work  to  date  has  taken  advantage  of  properties  of  directed  graphs  to  facilitate 
exact  calculations  but  is  limited  with  regard  to  distributional  assumptions  and  feasible 
system  sise.  Monte  Carlo  methods  for  such  calculations  can  accommodate  much  larger 
system  sise  with  arbitrary  dependence  structure  and  distributional  forms  yielding 
approximations  which  can  be  as  accurate  as  desired.  It  is  the  objective  of  this  paper  to 
detail  such  methodology.  An  illustration  is  provided  using  a  diagnostic  system  for 
congenital  heart  disease  in  neonates. 
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1.  Introduction. 

In  formulating  models  for  &  complex  system  graphical  representation  is  a  useful  too. 
For  non-deterministic  systems,  a  directed  graphical  form  provides  a  depiction  of  the  joint 
distribution  of  the  random  components  as  well  as  clarification  of  the  nature  of  the 
dependence  structure  present  among  them.  More  precisely,  in  such  graphs,  random 
variables  are  pictured  by  circles  or  dots  called  the  nodes  of  the  graph,  while  direct 
dependencies  between  pairs  of  variables  are  represented  by  arrows.  The  node  from  which* 
an  arrow  emanates  is  the  "parent”,  while  the  receiving  one  is  the  "child".  Thus  the  graph 
portrays  qualitative  dependence  amongst  the  variables.  The  induced  dependence  structure 
is  quantified  by  specification  of  conditional  distributions  at  each  node  given  its  parents. 
Conditional  densities  at  nodes  are  denoted  genetically  by  f(X|Y)  where  X  denotes  the 
variable  defined  at  the  node,  and  Y  its  parents.  Figure  1  shows  an  illustrative  six  node 
dependence  structure. 


[Insert  figure  1  about  here] 

The  provision  of  a  conditional  distribution  for  each  variable  in  terms  of  its  parents 
is  sufficient  to  uniquely  determine  the  distributional  model  for  the  entire  graph  (see  e.g. 
Whittaker,  1990,  chapter  3).  We  can  construct  the  joint  structure,  from  information 
provided  locally  without  requiring  profound  understanding  of  the  whole  system.  This 
facilitates  construction  of  the  model  and  provides  a  powerful  communication  mechanism 
between  statisticians  and  their  interdisciplinary  audience.  Most  significantly  however, 
graphical  representation  of  the  statistical  model  distills  dependence  to  necessary  parts.  For 
example  consider  a  "top  down"  factorization  of  the  joint  distribution  in  figure  1. 

f(a,b,...,f)  =  f(f|a,b,c,d,e)f(e|alb,c,d)f(d|a,b,c)f(c|a,b)f(b|a)f(a).  (1.1) 

What  does  the  graphical  model  tell  us  about  the  terms  on  the  right  hand  side? 
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Recall  that  two  (possibly  vector  vaiued)  random  variables  Xj  and  X2  are  conditionally 
independent  given  X3,  denoted  by  X^  II  XJX3,  iff  ^XpX^Xg)  =  f(x1  |x3)  f(x2|x3),  or, 
equivalently,  iff,  ffxjJx^Xg)  =  {(xjxg)  and  f(x2|x1,x3)  =  f(x2|x3).  For  example,  in  a 
Markov  chain  where  dependence  at  state  t  given  all  preceding  states  l,2,...,t-l  is  limited  to 
the  immediately  preceding  state  t-1, 

i(xtlx<_1^t_2»  ”ixo)  =  .(x$_ 2* — *xi) I x ^ i* 

In  response  models  for  Y  suppose  that  from  the  collection  of  explanatory  variables  x 
subset  x  is  sought  such  that,  given  this  subset,  dependence  on  the  rest  of  the  variables  is 
minimal.  Then  f(y|x)  =  f(y|xfl),  i.e.  Y_J[_x£|Xfl  where  x£  is  the  complement  of  Xft. 

Returning  to  (1-1),  the  graph  implies  the  following  conditional  independence 
relationships:  F  J|  (A,B,E)  |  (C,D)  from  which  f(f|a,b,c,d,e)  =  (f|c,d);  E  ||  (A,B,D)|C 
from  which  f(e|a,b,c,d)  —  f(e|c);  D  ||  C|(A,B)  which  implies  f(d|a,b,c)  =  f(d|a,b);  and 
A  ||  B  which  implies,  f(b|a)  =  f(b).  Hence  (1.1)  simplifies  to  f(a,b,c,d,e,f)  =  f(f|c,d) 
f(e|c)  f(d|a,b)  f(c|a)  •  f(b)  •  f(a). 

Effective  use  of  the  graphical  model  requires  the  ability  to  compute  arbitrary 
marginal  and  conditional  distributions.  At  the  system  level,  the  need  to  be  able  to 
calculate  desired  distributions  arises  as  we  try  to  understand  quantitatively  both  static  and 
dynamic  system  behavior.  In  some  cases,  such  distributions  can  be  used  for  model 
assessment,  in  the  sense,  of  determining  the  plausibility  of  our  proposed  model  of  the 
system  (see  section  3).  Once  comfortable  with  this  model,  such  distributions  enable  use  of 
the  system  for  diagnostic  purposes.  In  this  regard  we  seek  to  condition  given  observed 
information,  i.e.,  to  adjust  distributions  by  propagation  of  observed  information  through 
the  network.  In  the  absence  of  information  this  amounts  to  marginalization.  Essential 
here  is  that  very  little  if  any  restriction  should  be  imposed  on  data  acquisition,  i.e.,  on 
which  part  of  the  graphical  model  is  considered  given  (fixed). 

Tim  directed  graphical  models  we  employ  here  have  been  called  causal  models 
(Pearl,  1987),  graphical  association  models  (Lauritzen,  1990a),  and,  most  recently,  causal 
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probabilistic  networks,  CPN’s,  (Jensen,  Lauritzen  and  Olesen,  1991).  Among  the  many 
areas  of  applications  of  these  models  are  expert  systems,  artificial  intelligence,  genetics, 
reliability  trees  and  organizational  structures.  Motivation  for  a  great  deal  of  work  in  the 
statistical  uses  of  graphical  modeling  can  be  traced  to  the  seminal  paper  of  Darroch, 
Lanritzen  and  Speed  (1980)  on  the  analysis  of  log-linear  models.  This  article  introduced 
the  class  of  graphical  models,  exploring  in  an  elegant  and  powerful  way  the  implications  of 
conditional  independence. 

The  work  of  Lanritzen  and  Spiegelhalter  (1988),  renewed  interest  in  graphical' 
approaches  by  developing  an  efficient  procedure  for  the  analysis  of  general  discrete  models. 
They  assume  that  the  dependence  structure  is  completely  known,  i.e.,  the  conditional 
probabilities  of  an  event  given  all  influencing  factors  are  specified  a  priori,  either  relying  on 
expert  opinion  or  case  specific  data.  Subsequent  work  (e.g.,  Spiegelhalter  and  Cowell, 
1991)  allows  for  imprecision  in  the  model  specification  by  modeling  the  conditional 
distributions  at  each  of  the  discrete  nodes  as  parametric  families.  The  dependence 
structure  is  kept  intact,  while  at  each  variable  prior  distributions  are  added  to  the 
parameters  of  the  conditional  distribution  defined  on  it.  These  prior  distributions  are 
represented  by  parental  nodes  at  the  top  of  the  network.  Figure  2  extends  figure  1  in  this 
fashion.  Extending  the  graphical  model  in  this  way  seems  sensible.  The  resultant  joint 
distribution  incorporates  the  parameters  in  a  natural  hierarchical  fashion.  It  also  enables 
us  to  maintain  the  assumption  that  the  dependence  structure  in  the  graphical  model  is 
completely  specified. 


[Insert  figure  2  about  here] 

t 

An  important  parallel  development  is  work  by  Lauritzen  (1990a, b)  and  Wermuth 
and  Lauritzen  (1990)  which  formulates  the  Conditional  Gaussian  (CG)  family  of 
distributions,  in  an  attempt  to  analyze  mixed  models  i.e.  models  having  both  discrete  and 
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continuous  nodes.  Though  an  important  extension  of  existing  theory,  it  is  very  restrictive 
in  not  allowing  conditional— to-discrete  dependence,  i.e.  no  continuous  node  can  have  a 
discrete  child 

Based  upon  triangulation  of  the  graph,  Lauritxen  and  Spiegelhalter  (1988)  presented 
an  algorithmic  approach  for  analytic  calculations  in  a  purely  discrete  model  which  requires 
computation  only  at  the  local  level.  Approximate  versions  for  mixed  models  are  described 
in  Lauritzen  (1990b).  Though  elegant  these  methods  do  not  offer  a  unified  approach  to  the 
processing  of  general  graphical  models.  Distributional  assumptions  are  limited.  Tailored* 
approximations  are  ultimately  necessary  in  cases  where  modeling  makes  analytical 
calculations  intractable  as  is  the  case  with  non-conjugate  models.  High  model  dimension 
can  severely  strain  computational  resources.  Monte  Carlo  approaches  offer  the  promise  of 
accommodating  these  problems. 

Monte  Carlo  approaches  have  been  considered  by  earlier  researchers  though 
exclusively  for  binary  nodes.  Pearl  (1987)  uses  the  neighborhood  structure  inherent  in  the 
network  to  construct  the  complete  conditional  distribution  of  each  node.  Under  mild 
conditions  the  complete  conditional  distributions  uniquely  determine  the  joint,  (e.g.  Besag, 
1974).  Appropriate  sampling  of  them  produces  a  Markov  chain  whose  stationary 
distribution  is  the  one  we  desire.  Explicit  details  are  given  in  section  2.3  where  we 
generalize  Pearl’s  approach.  Another  example  is  the  Likelihood  Weighting  method 
(Shachter  and  Peot,  1989).  This  technique  uses  a  combination  of  conditional  independence 
relations  and  an  importance  sampling  scheme  to  derive  estimates  of  statistics  or  functions 
of  the  variables  from  samples  generated  according  to  the  discrete  joint  distribution. 
Extension  to  general  mixed  models  is  the  subject  of  section  2.2. 

The  key  point  in  our  work  is  a  change  in  focus.  Rather  than  customary  analytical 
attempts  to  obtain  features  of  a  high  dimensional  joint  distributions  we  propose  the  use  of 
sampling— based  approaches  to  approximate  such  features.  In  concert  with  subgraph 
approximations,  which  we  report  on  in  a  subsequent  paper,  Monte  Carlo  technology  can 
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process  very  large  systems  in  a  unified  manner,  with  the  prospect  of  "black  box"  software 
(we  note  the  current  development  of  BUGS  -  Bayesian  Inference  Using  Gibbs  Sampling  - 
as  reported  in  Thomas,  Spiegdhalter  and  Gilks,  1991). 

In  section  2  we  fully  describe  the  independent  or  importance  sampling  approach  as 
well  as  the  Markov  chain  (Gibbs  Sampling)  approach.  In  section  3  we  take  up  classes  of 
distributional  families  which  are  attractive  as  nodes  in  our  directed  graphical  model,  as 
well  as  the  question  of  model  assessment.  In  section  4,  we  illustrate  both  approaches  with 
the  analysis  of  a  twenty— node  mixed  graphical  model  for  diagnosis  of  congenital  heart* 
disease  in  neonates.  We  conclude  this  section  with  some  notation  and  a  few  formal 
definitions  which  will  be  used  in  the  remainder  of  the  paper. 

Central  to  our  graphical  methodology  is  the  concept  of  a  Directed  Acyclic  Graph 
(DAG).  A  DAG  consists  of  a  set  of  nodes  V  and  a  set  EcVxV  of  ordered  pairs  such  that  if 
(ij)€E  then  (j4)£E.  Groups  of  nodes  in  V  form  paths  if  for  each  two  nodes  i  and  j  in  the 
group,  (i,j)eE  or  (j,i)eE.  Acyclic  graphs  have  no  cycles  (paths  with  the  same  starting  and 
ending  node).  Cycles  lead  to  logical  implausibilities  (Pearl,  1986).  Probabilistically,  the 
joint  distribution  of  the  variables  is  not  uniquely  defined. 

One  of  the  fundamental  relations  expressible  by  a  DAG  is  that  of  precedence. 
Parental  nodes  precede  their  children,  in  a  causal  or  temporal  sense,  inducing  an  ordering 
of  the  nodes  in  the  graph.  We  enumerate  the  nodes  in  a  top-down  fashion,  starting  with 
those  without  parents  (source  nodes)  at  the  first  level,  proceeding  to  their  children  at  the 
next  level,  and  so  on  to  the  bottom  of  the  graph.  Nodes  allocated  to  the  same  level  can  be 
ordered  arbitrarily  permitting  many  equivalent  enumerations. 

If  node  i  precedes  j,  written  ixj,  according  to  an  ordering  in  the  graph,  then 
i€pr(j)  =  {s:s«j},  the  set  predecessors  of  j.  We  also  define  the  parental  set  of  i  as  pa(i),  and 
the  set  of  children  of  i  as  ch(i)  and  denote  the  variable  at  node  i  by  X..  Conditional 
independence  arises  naturally  for  a  DAG  in  terms  of  the  set  of  predecessors.  That  is,  if 
nodes  i  and  j  are  not  connected  and  i«j,  then 


LID  I  Pr(j-i)  < = >  j JJ 1  P»Q) 

where  pr(j-i)  is  the  set  of  predecessors  of  j  excluding  i.  This  is  equivalent  to  saying  that  j 
is  conditionally  independent  of  i  given  its  parents. 

As  a  result,  the  following  factorization  of  the  joint  density  of  the  random  variables 
at  the  nodes  in  V  ensues  (see  e.g.  Whittaker,  1990,  p.  73).  If  n  is  the  number  of  nodes  in 
V, 

f(V’xn)=  11  f(xvIPa(v))  (!-2) 

v-l 

This  factorization  was  illustrated  in  conjunction  with  figure  1  and  is  central  to  the 
stochastic  simulation  methodology  described  in  section  2  as  well  as  to  the  distribution 
theory  in  section  3. 

2.  Simulation  Approaches 

Desired  analysis  of  general  mixed  directed  graphical  models  requires  high 
dimensional  integration/summation.  Such  calculations  usually  can  not  be  carried  out 
analytically,  either  exactly  or  approximately.  Simulation  methods  offer  a  viable 
alternative  as  shown  in  section  2.1.  In  particular,  Monte  Carlo  methods  may  be  performed 
noniteratively  or  iteratively.  We  detail  these  approaches  in  subsections  2.2  and  2.3 
respectively. 

2.1  Why  Simulation? 

The  joint  density  of  the  set  of  variables  X  =  (X^X^,...^)  represented  by  the 
graphical  model  G  is  expressible  by  the  factorization  (1.2).  The  variables  X-  may  be 
either  discrete  or  continuous.  Explicit  discussion  of  forms  for  the  f  s  is  deferred  to  section 

3.  Following  earlier  remarks  we  assume  that,  whatever  form  these  specifications  take,  all 
the  f*s  are  fully  known,  i.e.,  included  in  the  set  pa(Xj)  are  any  parameters  in  the  density  of 
Xj.  To  answer  questions  we  may  be  interested  in  regarding  model  G  requires  the  ability  to 
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compute  arbitrary  conditional  distributions  and  their  features.  In  particular,  suppose  the 
conditional  information  fixes  a  subset  of  the  variables  (possibly  an  empty  set  if 
marginalisation  is  sought),  XQ,  of  size  nQ  to  specified  levels.  The  nodes  in  XQ  are  called 
evidence  nodes  in  Shachter  and  Peot  (1989).  Let  Xy  denote  the  complement  of  XQ,  i.e.  the 
unobserved  nodes.  We  seek  the  conditional  distribution  of  X„  given  X  =x„  as  well  as 

U  °  0  0 

/  / 

that  of  X  given  X0=xQ,  where  X  denotes  a  generic  component  of  Xu.  Exact  calculation 
of  f(Xu|XQ  =  xQ)  requires  an  n  -  nQ  dimensional  integration/summation  and  calculation  of 
f(X  |XQ  =  xQ)  requires  an  additional  nQ—  1  dimensional  integration/summation.’ 
Envisioning  n  and  possibly  nQ  to  be  large,  such  computation  will  not  be  feasible  by  exact  or 
approximate  analytic  methods.  Hence,  we  turn  to  simulation  strategies. 

As  observed  in  Smith  and  Gelfand  (1992),  there  is  an  essential  duality  between  a 
sample  and  the  density  (distribution)  from  which  it  is  generated.  Clearly  the  density 
generates  the  sample.  But  conversely,  given  a  sample  we  can  approximately  recreate  the 
density  and  its  features.  Thus  our  objective  is  to  draw  samples  from  f(Xu|XQ  =  xQ). 
Drawing  observations  from  the  joint  distribution  of  X  is  straightforward.  It  may  be  done 
in  a  "top  down"  fashion  using  tht  components  of  the  factorization  (1.2).  That  is,  we 
sample  all  source  nodes,  then  sample  all  their  children,  etc.  The  directed  graphical  model 
reveals  which  sampling  orders  are  thus  feasible  and  we  shall  in  fact  assume  that  the 
labeling  of  the  X’s  from  to  XQ  constitutes  a  feasible  sampling  order. 

In  attempting  to  sample  f(XQ  |  XQ  =  xQ)  one  might  take  draws  from  f(X)  and  retain 
those  meeting  the  evidence  XQ  =  xQ.  Such  rejection  sampling  is  called  logic  sampling  (see, 
e.g.,  Henrion  1988),  and  is  very  inefficient  when  tbe  nodes  in  XQ  are  discrete  but  the  event 
XQ  =  xQ  is  rare;  it  is  impossible  if  any  of  the  nodes  in  XQ  are  continuous. 

Note  that,  though  we  can  not  obtain  f(Xu|XQ  =  xQ)  explicitly,  we  do  know  its  form 
modulo  a  normalizing  constant,  i.e., 

f(x„lXo  =  Io)“f(Xu'Io)  <21> 

where  the  right  hand  side  of  (2.1)  is  given  in  (1.2).  In  fact  we  can  write 
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f(xu  I  Vxo)* Y  n Yf (xi  n  f (Xj  I  pifXj))  | 


VXo 


Vx„ 


'<V#> 


(2.2) 


Suppose,  as  is  traditional,  that  we  refer  to  what  is  observed  as  "data"  and  what  is 
unobserved  as  "parameters”.  Then  the  first  product  on  the  right  side  of  (2.2)  could  be 
considered  as  a  likelihood  since  here  all  the  X.  are  observed  while  the  second  product  could 
be  considered  as  a  prior  since  here  none  of  the  X.  are  observed.  Then  (2.2)  is  of  the  form 
Likelihood  i  Prior  as  in  customary  Bayesian  modeling  and  we  may  bring  to  bear  on  our 
problem  all  of  the  recently  discussed  sampling-based  technology  for  Bayesian  calculations. 
This  work  includes  noniterative  Monte  Carlo  using  importance  sampling,  as  in  Van  Dijk 


and  Kloek  (1983),  Stewart  (1983),  and  perhaps  best  summarized  in  Geweke  (1989)  as  well 
as  iterative  or  Markov  chain  Monte  Carlo  as  in  Gem  an  and  Geman  (1984),  Tanner  and 
Wong  (1987)  and  Gelfand  and  Smith  (1990)  and  perhaps  best  summarized  in  Tierney 
(1991). 


2.2  Independent  or  Noniterative  Monte  Carlo 


Since  we  can  not  sample  from  f(Xu|XQ  =  xQ)  directly,  we  develop  and  employ  a 
suitable  importance  sampling  density  (ISD).  More  precisely  from  a  density  say  g(Xu)  we 
draw  a  large  sample  denoted  by  Xut,  t  =  l,...,m  where  g(Xu)  has  the  same  support  as 
f(Xn|X0  =  xQ).  Then,  expectations  under  f(Xu|XQ  =  xQ)  e.g.  E(h(Xu) | XQ  =  xQ)  are 
approximated  by  the  weighted  sum 


m 

£  w 
t=l 


t 


(2.3) 


where  wt  =  f(Xut|XQ  =  xQ)/g(Xut).  Moreover,  resampling  the  Xyt  with  probabilities 
gt  =  wt/Ewt  produces  samples  approximately  distributed  according  to  f(Xy  |  XQ=x0)  (see 
Rubin,  1988,  Smith  and  Gelfand,  1992). 

The  selection  of  g  becomes  the  primary  concern.  The  more  closely  g  resembles 
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f(Xu>xo)  the  more  efficient  g  is  in  terms  of  sample  size  m.  Hence  a  good  ISD  is 
characterized  as  having  fairly  constant  weights  w^.  Such  stability  might  be  measured 
through  the  variance  of  the  wt  (Geweke,  1989)  or  their  entropy  (Ferguson,  1983).  A 
natural  choice  for  g  would  be  the  prior  II  f(X.  I  pa(X.)  which,  provided  that  it 

VXu  '  ‘  'PVo) 

is  proper,  it  can  be  readily  sampled  using  a  feasible  sampling  order.  This  choice  of  g  results 

which  would  naturally  be  called  likelihood 

weights  (see  Shachter  and  Peot,  1989),  i.e.,  bigger  weights  are  attached  to  "more  likely"- 
Xut’s.  Such  wt  are  not  likely  to  be  very  stable  since  we  would  not  expect  the  prior  to 
"match"  the  Likelihood  i  Prior  form.  Nonetheless,  such  weights  are  used  successfully  in 
the  example  of  Section  4. 

A  more  refined  selection  of  g  can  be  obtained  as  follows.  Partition  Xu  into  a  set  of 

d.  c 

discrete  and  a  set  of  continuous  variables  denoted  by  Xu  and  Xu  respectively.  We  consider 
an  ISD  which  samples  equally  likely  over  the  domain  of  xjj  and  then,  given  X^  =  x^, 
draws  x£  -  g(X^  |  Xjj  =  x^j).  The  Joint  density  of  Xu  under  this  ISD  is  proportional  to  the 
conditional  density  g(X^|X*j).  Thus  to  match  f(Xu,  xQ),  for  each  X^  =  x^  we  would 
choose  g(X^  |  x^J)  to  match  f(X^,  x^,  xQ).  Methods  for  developing  an  efficient  ISD  for  a 
nonstandardized  continuous  density  have  received  much  attention  lately  (see  e.g.  Geweke, 
1989,  Oh  fr  Berger,  1992,  West,  1992).  In  most  of  this  work  the  resultant  ISD  is  a  mixture 
of  normal  or  t  distributions. 

2.2.1  Performance  of  Independent  Monte  Carlo  Estimates 

Results  in  Geweke  (1989)  applied  to  hm  of  (2.3),  show  that  under  mild  conditions, 

(i)  ^a[hm-  Eh(Xu|XQ  =  xQ)]  is  asymptotically  normal  N(0,  a2),  where  a2  = 
c-2(*0)/  (M\)  -  Eh(Xu|X0  =  x0))2  •  x0)/8(xu)  )dxu.  Here  c(x0)  = 

j f(xu,xQ)dxu  with  integral  signs  here  and  in  the  sequel  denoting  a  multiple 


in  weights  wt  =^  II  ^  f(Xi|pa(Xi))  | 
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(ii) 


integration /summation  over  the  domain  of  XQ. 

in  2 

/(  E  w, }  which  can  be  calculated  from  the  sample  of 
t=l  ' 

is  such  that  m<rm  -*  a  enabling  an  estimate  of  the  accuracy  of  hm- 


£  = 


Suppose  we  seek  density  estimates  for  the  components  of  Xu  perhaps  to  obtain 
modal  values  or  quantiles.  The  X^  can  be  resampled  as  suggested  after  (2.3),  to  create 
samples  approximately  distributed  according  to  f(Xu|XQ).  Then,  using  appropriate 
coordinates,  histograms  for  the  X^  and  kernel  density  estimates  for  the  X^  can  be  created. 
Improved  estimates  utilizing  the  known  forms  f(Xu,xQ)  and  g(Xu)  are  available.  If  we 
want  an  estimate  of  f(X  |  XQ=xQ),  where  X  is  a  generic  component  of  Xu,  we  may  write, 


g(*u) 


g(*' »*u) 


where  Xu  is  the  collection  of  all  variables  in  Xu  apart  from  X  and  g(Xu)  =  Jg(xu)dx'. 
Suppose  xut=(xt.x^t)»  t=l,2,  ...,m,  are  generated  from  g(Xu).  Let 


-  #  Sht(X)w 

1(X  |x0=i0)  =  -|Wt 


(2.4) 


,  wt(X  )  ,  f(X  ,Xt,x  )  f(x'x'  ,xj 

where  ht(X  )  =  — — - ,  wt(X  )  = - ; - ,  and  w.  = - ; - .  Since  (2.4) 

*  w‘  ‘  nx;>  ‘  g(xt.xut) 

is  of  the  form  (2.3)  results  of  Geweke  apply.  In  particular,  at  any  fixed  X  ,  we  may 
estimate  the  accuracy  of  our  density  estimator  using 

?(X)  =  I  [ht(x'H(x'  |x  =x  )]2  •  w2/(  S  wt)2  (2.5) 

t=l  1  00  1  t=l  1 


/  / 

Note  that  w^  need  not  be  recomputed  with  changing  X  .  However  in  computing  wt(X  ),  a 

/  / 

univariate  integration  or  summation  over  X  is  required  to  obtain  g(Xu).  In  the  continuous 
case,  the  integration  can  be  done  quickly  and  cheaply  using  a  trapezoidal  or  Simpson’s  rule 
integration.  The  estimator  in  (2.4)  is  used  in  the  illustrative  example  of  section  4. 
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2.3  Dependent  or  Iterative  Monte  Carlo 

Rather  than  using  independent  samples  as  in  the  ISD  approach  of  section  2.2  we 
may  generate  dependent  sequences  using  a  Markov  chain  whose  equilibrium  distribution  is 
f(Xu|X0  =  x0).  Such  Markov  chain  Monte  Carlo  dates  at  least  to  Metropolis  et  al  (2953). 
A  version  which  samples  from  updated  complete  conditional  distributions  was  introduced 
as  the  Gibbs  sampler  by  Geman  and  Geman  (1984)  for  restoration  of  noisy  images.  The 
setting  is  a  high  dimensional  Markov  random  field  with  binary  nodes  built  from  local 
neighborhood  structure.  Pearl  (1987)  applied  the  Gibbs  sampler  (referring  to  it  only  as  a* 
stochastic  simulation  technique)  to  causal  models  involving  binary  variables.  Gelfand  and 
Smith  (1990,  1991)  discuss  a  version  involving  continuous  nodes  in  the  context  of 
hierarchical  Bayesian  models. 

The  attraction  of  the  Gibbs  sampler  is  that  draws  from  the  high  dimension  density 
f(Xu  |  XQ=x0)  are  obtained  by  making  draws  from  the  univariate  complete  conditional 
distributions  f(X  |  Xu>x0).  Given  a  starting  state  vector  for  Xu,  the  components  of  Xy  are 
typically  sampled  in  the  natural  order,  sometimes  referred  to  as  a  raster  scan,  with  the 
conditional  levels  of  Xu  updated  after  each  sampling  while  XQ  remains  "clamped"  at  xQ  (In 
fact,  any  infinitely  often  visiting  order  of  the  components  works).  One  pass  through  all  of 
the  components  of  XQ  is  called  an  iteration.  After  l  such  iterations  a  sampled  vector  X^ 
will  result.  Under  conditions  which  insure  that  the  iterations  X^  are  not  trapped  in  a 
portion  of  the  state  space  of  Xu,  as  /  m,  X^  converges  in  distribution  to  an  observation 
from  f(XQ|XQ=xQ).  Such  conditions  mandate  that  we  can  not  permit  any  purely 
deterministic  nodes  in  our  graphical  model.  Technically,  we  merely  remove  any  such 
nodes,  adjusting  parents  and  children  accordingly.  Convergence  will  be  at  a  geometric 
rate,  possibly  uniformly  so  (see  e.g.  Roberts  and  Poison,  1990,  Schervish  and  Carlin,  1991, 
Liu,  Wong  and  Kong,  1991  and  rather  generally,  Nummelin,  1984).  Gel  man  and  Rubin 
(1992)  show  that,  for  certain  models,  l  will  have  to  be  extremely  large  before  "acceptable" 
convergence  is  achieved  and  that  several  different  starts  for  XQ  will  be  required  to  make  a 
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convergence  assessment.  An  attractive  empirical  tool  for  assessing  convergence  is  the 
Gibbs  stopper  as  described  in  Tanner  (1991).  Unfortunately,  apart  from  special  cases,  its 
computation  becomes  prohibitive  with  increasing  n.  However,  use  of  subgraph 
approximation  results  in  a  substantially  reduced  graph,  assisting  in  the  examination  of 
convergence. 

It  is  worthwhile  to  remark  that  errors  due  to  the  quality  of  our  various 
approximations  will  likely  be  small  compared  to  the  general  uncertainty  in  the  model 
specification  itself.  For  complex  high  dimensional  models  incorporating  arbitrary* 
dependence  structure,  ttballparkn  density  estimates  may  have  to  suffice. 

Let  *is  be  more  specific  about  the  form  of  the  complete  conditional  distribution  lor 

/  /  / 

X  .  It  is  dear  that  f(X  |XuJX0=xQ)  is  proportional  to  (1.2).  Moreover,  with  regard  to 
factorization,  only  terms  involving  X  as  a  child  or  as  a  parent  need  be  considered  so  that 


f(x'|x;^0=x0).  («[x'|p^x')]  n  fIV|pa(V)])l  (2.6) 

V€ch(X)  '  ii’V 

Thus,  the  only  variables  involved  in  the  complete  conditional  distribution  of  X  ,  are  its 
parents,  its  children,  and  the  parents  of  these  children.  This  set  has  been  called  the 
Markov  blanket  by  Pearl  (1986).  Since  typically  dependence  in  the  model  is  sparse,  only  a 
few  of  the  terms  in  (1.2)  appear  in  (2.6). 

Turning  to  the  sampling  itself  we  recall  that  by  virtue  of  the  Markovian  updating 
the  conditional  levels  in  (2.6)  will  change  with  each  draw  of  X  .  In  addition,  the  form  of 
(2.6)  will  almost  never  be  a  standard  distribution  even  if  the  individual  terms  in  the 
product  are.  For  discrete  variables  sampling  is  routine  upon  standardization/summing  of 
(2.6)  although  for  ordinal  variables  rejection  methods  are  available  (Devroye,  1986).  For 
continuous  variables  we  might  also  consider  a  rejection  method  (see  Devroye,  1986  or 
Ripley  1987).  For  instance  an  envelope  function  for  (2.6)  is  Mf[X  |pa(X  )]  where 


(Vo) 


.  Typically,  f[X  |pa(X  )]  is  readily  sampled  and  M 


M=sup  n  f(V|pa(V))| 
x'  V€ch(Y)  ' 

w 

is  not  difficult  to  compute  though  it  must  be  recomputed  with  changing  levels  of  Xy.  In 
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practice  such  envelopes  tend  to  be  very  inefficient  producing  very  small  acceptance  rates. 

/  / 

This  is  not  surprising  since  the  concentration  of  mass  for  f[X  |pa(X  )]  may  be  quite 
different  from  that  of  (2.6).  Nonetheless,  we  used  such  rejection  for  our  example  in  Section 
4  since  generation  from  the  univariate  f(X/|pa(X/))  is  so  inexpensive. 

Unfortunately,  specialized  rejection  methods  such  as  in  Gilks  and  Wild  (1991)  which 
take  advantage  of  log  concavity  of  the  density  f,  squeezing  etc.  will  not  likely  be  applicable 
to  the  form  (2.6).  Other  tailored  versions  may  be  costly  and  not  readily  automated. 
Alternative  "black  box"  Gibbs  samplers  for  graphical  models  handle  continuous  variables* 
by  approximate  inversion  of  the  probability  integral  transform  as  in,  e.g.,  Tanner  (1991), 
by  the  use  of  a  modified  ratio  of  uniforms  method  as  described  in  Wakefield,  Gelfand  and 
Smith  (1991),  by  adaptive  normal  kernel  density  approximations  to  (2.6)  (Silverman,  1986, 
§5.3). 

As  for  density  estimation  utilizing  the  output  of  the  Gibbs  sampler,  suppose  we 
have  a  sample  of  m  vectors  Xut>  t=l,2,...,m,  roughly  independent  and  approximately 

distributed  from  f(Xu  |  XQs=x0).  For  component  X  we  could  create  a  histogram  from  the 

/  /  * 
set  Xt  for  discrete  X  or  a  kernel  density  estimate  (as  in  Silverman,  1986)  for  X 

continuous.  However,  Rao-Blackwellized  density  estimates, 

«(X-  |X0  =  x0)  -  i  ?J(X-  |X  V  X0  =  *„),  (S.7) 

using  the  known  structure  (2.6),  are  preferable  (see  in  particular,  Gelfand  and  Smith  1991). 

3.  Distributional  Models  and  Model  Choice 

Thus  far  we  have  assumed  complete  specification  of  the  directed  graphical  model. 
As  emphasized  earlier  the  attraction  of  these  models  is  that  the  global  structure  is 
determined  through  provision  of  local  detail.  Specification  of  local  detail  is  at  the  node 
level  in  accordance  with  the  factorization  (1.2)  and  encompasses  two  aspects  - 
identification  of  parents  and  of  conditional  distributions  at  the  node  given  levels  of  the 
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parental  variables. 

In  models  comprised  of  an  ensemble  of  variables  with,  a  priori,  little  structural 
insight,  often  the  purpose  of  the  modeling  exercise  is  to  identify  the  nature  of  the 
dependence.  In  such  contexts  edges  of  the  graph  need  not  be  directed  so  that  there  can  be 
links  between  nodes  without  labeling  a  parent  and  a  child.  The  book  of  Whitaker  (1990) 
provides  a  good  account  of  the  process  of  modeling  dependence.  See  also  the  recent  papers 
of  Wennuth  and  Lauritsen  (1990)  and  Edwards  (1990).  Our  models  are  directed,  i.e., 
restricted  to  systems  having  a  natural  flow  or  hierarchical  sequence  for  the  variables* 
whence  identification  of  parents  is  implicit.  It  is  unrealistic  to  assume  that  the  dependence 
structure  is  known  so  precisely.  Few  a  given  set  of  nodes  we  might  postulate  several 
structural  forms  i.e.  several  directed  graphical  models,  and  attempt  to  choose  amongst 
them.  We  return  to  this  issue  at  the  end  of  the  section. 


As  for  distributional  specification,  in  practice  such  distributions  should  be  elicited. 
The  Bayesian  literature  on  elicitation  may  be  useful  in  this  regard.  See,  for  example, 
Kahneman  et  al  (1982)  for  a  very  readable  discussion,  and  Kadane  et  al  (1980)  for 


implementation  suggestions.  Still,  several  functional  forms  may  be  candidate  distributions 
again  leading  to  the  question  of  model  choice.  An  attractive  feature  of  the  simulation 
methodology  developed  in  section  2  is  that  it  can  be  applied  regardless  of  the  forms  taken 
for  the  ffXjlpafX.)). 

Classes  of  models  for  discrete  nodes  are  discussed  in  Spiegelhalter  and  Lauritsen 
(199G).  In  particular,  for  an  arbitrary  node  Y  let  us  think  of  pa(Y)  as  comprised  of  two 
sets,  one  a  set  of  "causal"  parents,  jy  and  the  other  a  set  of  "parametric"  parents,  0y. 
Then  X  is  viewed  as  the  collection  of  nodes  (V,6)  where  V  is  the  set  of  all  "nosparameter" 


nodes  and  #  = 


becomes 


U  K  provides  the  overall  parametrization  for  the  model. 
YeV  * 


men  ^i.2) 


nMip&f*;)) 
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Customary  modeling  imposes  pa(£.)  c  A 

Spiegelhalter  and  Lauritzen  introduce  an  assumption  of  global  independence  which 

says  that  II  f(0-|pa(0.))  «  n  f(IL  ).  Thus  the  A,  are  presumed  to  be  independent 
^€#  V-€  V  i  vi 

sets  of  parameters.  If  all  of  the  parents  of  Y  are  discrete  variables  we  can  envision  #y  as 

breaking  into  components  each  corresponding  to  a  configuration  of  pa(Y).  Spiegelhalter 

and  Lauritzen  then  introduce  the  further  assumption  of  local  independence  if  f(#y) 

fact  ora  into  a  product  of  densities  over  these  components. 

Now  if  Y  is  a  nominal  variable  its  realisations  constitute  multinomial  sampling  in 

which  case  the  component  of  ly  corresponding  to  a  particular  configuration  of  pa(Y) 

could  be  taken  as  the  vector  of  multinomial  probabilities  for  Y  under  this  configuration. 

A  convenient  choice  of  density  for  this  component  would  be  a  Dirichlet  whence,  under  local 

independence,  f(8y)  would  be  a  product  of  Dirichlet  densities  over  all  possible  onfigurations 

of  pa(Y).  As  Spiegelhalter  and  Lauritzen  observe,  such  specification  introduces,  for  Y 

alone,  ly  of  dimension  equal  to  (#  of  levels  of  Y  -  1)  *  (#  of  configurations  of  pa(Y)). 

A  more  parsimonious  assumption,  which  also  accommodates  the  case  where  Y  has  at  least 

one  continuous  parent,  models  f(Y|  7y,ly)  as  a  parametric  family  in  ly  7y  u 

covariates.  For  instance  we  might  model  the  multinomial  probabilities  for  Y  through  a  set 

of  logits  (e.g.,  baseline  or  adjacent).  In  the  binary  case  we  would  set 

‘r-'lf(Y=0|7Y,#Y))-T(TY)  •** 

where  T(qy)  is  an  r*l  vector  of  function*  of  7y  and  ly  is  thus  an  r*l  vector  of  coefficient 
parameters.  We  might  further  assume  a  multivariate  Gaussian  density  for  Ay,  possibly 
rather  vague. 

In  the  case  of  mixed  models,  the  unique  distributional  family  which  has  been 
discussed  in  the  literature  is  the  conditional  Gaussian  family  (Lauritzen  and  Wermuth, 
1989;  Wermuth  and  Lauritzen,  1990).  ;‘L,  conditional  Gaussian  distribution  provides  a 
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joint  distribution  for  the  entire  directed  graphical  model.  It  arises  from  a  two  part 
asymmetric  construction.  The  marginal  distribution  of  the  discrete  variables  is 
multinomial  and,  condition  on  these  variables,  the  continuous  variables  have  a  multivariate 
normal  distribution.  The  form  of  the  joint  density  (1.2)  is  then  immediate  and  is 
parametrised  by  the  set  where  i  indexes  the  state  space  of  the  set  of  discrete 

nodes  and  p-  is  the  probability  of  state  i  with  pj  and  Ej  the  mean  vector  and  covariance 
matrix  respectively  for  the  conditional  multivariate  normal  distribution  given  state  i.  The 
conditional  Gaussian  family  has  an  attractive  set  of  theoretical  properties  which  are* 
summarised  in  Lauritsen  (1990b)  and  in  Whittaker  (1990).  Unfortunately,  this  family  is 
restrictive  in  two  unappealing  ways.  First,  no  discrete  node  may  have  a  continuous  parent 
and  second,  all  continuous  nodes  must  follow  a  normal  distribution.  However,  using  this 
family,  exact  calculations  can  be  carried  out  using  the  junction  tree  representation  as  in 
Lauritsen  (1990a).  If  not,  Lauritsen  (1990a)  provides  only  an  approximate  computational 
method  using  second  order  Taylor  series  expansions  which  produces  moment  estimates  for 
the  conditional  density  f(Y|X0=x0),  but  no  density  estimates. 

We  propose  handling  continuous  nodes  as  we  did  the  discrete  nodes  above,  i.e., 
assuming  f(Y|  Ty,0y)  to  be  a  parametric  family  in  iy  with  jy  u  covariates.  A  broad  and 
convenient  class  of  specifications  would  be  those  of  generalized  linear  models  (McCullagh 
and  Nelder,  1989)  where  f(Y|7y,0y)  is  a  univariate  continuous  exponential  family  and  a 
suitable  link  function  connects  the  mean  of  Y  with  a  linear  form  in  9y.  Discrete  nodes 
modeled  as  e.g.,  binomial  or  Poisson,  could  also  be  handled  in  this  fashion. 

We  return  to  the  matter  of  model  choice.  Consider  two  competing  fully  specified 
directed  graphical  models  to  describe  a  system.  The  models  may  differ  in  terms  of  parental 
specification  at  one  or  more  nodes  and/or  distributional  specification  at  one  or  more  nodes. 
How  do  we  choose  between  the  models?  The  question  is  not  well  formulated  unless  we  ask 
it  with  regard  to  data  observed  at  components  of  the  system.  Suppose  XQ  denotes  the  set 
of  nodes  at  which  we  take  observations  and  suppose  we  observe  independent  xoP  1  ~ 
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Note  that  the  xQj  may  be  real  data  generated  from  the  system  or  artificial  data  generated 
under  some  joint  distribution  for  XQ  which  we  would  like  the  system  to  emulate.  It  seems 
natural  to  choose  the  model  more  likely  to  have  yielded  this  data.  Using  the  methodology 
in  section  2,  we  can  calculate  under  each  model  an  estimate  of  the  density  f(XQ),  hence,  an 
estimate  of  nf(x0^.  The  mcdel  with  the  larger  value  of  this  product  would  be  chosen. 


Unless  the  dimension  of  XQ  is  small,  a  very  large  L  may  be  required  to  obtain  a  satisfactory 

estimate  of  f(XQ).  Instead  we  might  replace  f(XQ)  by  II  f(X  |XQ),  where  XQ  is  the 

X'€X0 

collection  of  all  variables  in  XQ  apart  from  X  .  The  densities  in  this  product  are  univariate 
and  can  be  straightforwardly  estimated  using  ideas  in  a  ction  2.  In  particular,  with  Xu 
defined  as  before,  suppose  XQt,  t=l,2,...,m,  is  a  sample  from  f(xQ|  x^).  Then 

f(x'  I*')  =  m-1  E  f(x'|Xut,x^)  (3.1) 

Note  that  (3.1)  is  a  mixture  of  complete  conditional  distributions  for  x' .  See  Gelfand,  Dey 
and  Chang  (1992)  for  further  discussion  of  the  use  of  such  cross  validated  densities  in  the 
context  of  model  selection. 


4.  An  Illustrative  Example 

Accurate  diagnosis  of  congenital  heart  disease  immediately  after  birth  increases  the 
newborn’s  chances  of  survival  but  is  a  far  from  simple  procedure.  In  the  interest  of 
communication  of  preliminary  symptoms  between  admitting  and  specialist  physicians  and 
education  of  medical  students,  a  questionnaire  was  developed  to  aid  doctors  during  an 
infant's  referral  to  the  specialist  hospital  (Franklin  et  al.,  1991,  Spiegelhalter  et  al.,  1991). 
A  directed  graph  representing  one  aspect  of  congenital  heart  disease  appears  in  figure  3. 

[Insert  figure  3  about  here] 
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The  graph  and  extensive  conditional  probability  tables  quantifying  the  associations 
between  disease,  symptoms,  evidence  and  risk  factors  were  constructed  in  consultation  with 
a  group  of  pediatric  cardiologists,  and  used  by  Spiegelhalter  and  Cowell  (1991)  to  describe 
learning  abont  unobserved  variables  (in  this  particular  case  all  internal  nodes).  The 
conditional  probability  tables  were  kindly  provided  by  Dr  Spiegelhalter  to  aid  our  analysis. 
In  their  analysis  all  nodes  in  the  graph  were  assumed  discrete.  In  fact,  several  of  the  nodes 
such  as  11,  12,  16  and  17  are  inherently  continuous,  but  a  CG  family  of  models  cannot 
permit  the  first  two  to  of  these  to  be  so. 

We  analyze  this  graphical  model  by  the  Monte  Carlo  methods  of  Section  2.  Logit 
forms  were  used  to  describe  the  conditional  distributions  at  the  discrete  nodes,  while  the 
continuous  nodes  (11,  12,  16  and  17)  were  assumed  to  be  normal  on  the  logarithmic  scale. 
The  conditional  normal  distributions  for  the  continuous  nodes  were  chosen  (i.e.,  mean  and 
variance)  to  yield  probabilities  which  essentially  match  the  interval  probabilities  in  the 
provided  probability  tables.  Table  1  presents  marginalization  results,  listing  for  both  the 
Gibbs  and  the  importance  sampling  method,  estimates  of  marginal  probabilities  for  the 
discrete  nodes  and  estimates  of  the  mean  and  variance  for  the  continuous  nodes.  All 
estimates  presented  are  based  on  5000  random  generations.  Differences  in  the  estimates 
achieved  by  these  two  methods  are  small  and  within  anticipated  variability  based  on  the 
number  of  replications.  Plots  of  the  continuous  marginal  density  estimates  based  on  the 
same  number  of  generations  from  both  techniques  are  overlaid  in  figures  4a  through  4d. 
Finally,  to  illustrate  the  revision  of  probabilities  under  given  information,  we  update  the 
probabilities  for  each  of  the  diagnoses  at  node  3  given  clinical  data  (levels  of  nodes  15 
through  20).  For  instance  if  LVH  was  reported,  RUQ  02  =  1.5  (which  corresponds  to 
4.48  units),  Lower  Body  02  =  1.4  (4.05  units),  reported  C02>7.5,  X-ray  was  normal,  and 
grunting  was  reported,  the  probabilities  for  disease  (node  3)  become, 
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PFC 


TGA 


Fallot  PAIVS  TAPVD  Lung 

0.02833  0.17419  0.16339  0.53830  0.04844  0.04735  (GIBBS) 

0.01622  0.17838  0.16103  0.52048  0.06454  0.05936  (Imp.  Samp.) 

and  can  be  compared  to  the  results  in  Table  1.  The  evidence  point  towards  PAIVS  and 
away  from  lung  diseases  (influenced  by  the  report  of  a  normal  X-ray). 


Table  1.  Congenital  Heart  Disease  Example;  Initial  State  (no  evidence).  Estimates, 
of  probabilities  of  discrete  variables  and  means  and  variances  of  continuous  nodes  as 
produced  by  the  Gibbs  sampler  (first  entry),  and  Importance  Sampling  (second  entry), 
based  on  5000  generations. 


Yes 

No 

0.09859 

0.90141 

0.09830 

0.90170 

Aye 

0-3  days 

4-10  days 

11-30  days 

0.33098 

0.33335 

0.33568 

0.33320 

0.33300 

0.33380 

(3)  Disease 

PFC  TGA  Fallot  PAIVS  TAPVD  Lung 

0.03816  0.23229  0.34924  0.20520  0.13819  0.03692 

0.04120  0.24320  0.35340  0.20280  0.12700  0.03240 


(4)  Duct  Flow 

Left  to  Right  None  Right  to  Left 

0.56197  0.32355  0.11449 

0.55810  0.32750  0.11440 


(5) 

(6) 

(7) 


Cardiac  Mixing 


None 

Mild 

Complete 

Transp. 

0.03868 

0.12052 

0.64088 

0.19993 

0.04340 

0.12070 

0.62480 

0.21110 

Lung  Parench 

Nonxml 

Congested 

Abnormal 

0.49429 

0.25894 

0.24677 

0.50670 

0.26170 

0.23160 

Lung  Flow 

Normal 

Low 

High 

0.18923 

0.51253 

0.29824 

0.19260 

0.51070 

0.29670 
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TtkU  1.  (Cont’d) 


(1?)  Lower  Body 

Mean:  2.10503  Variance:  0.53596 

Mean:  2.15617  Variance:  0.53570 
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Table  1.  (Coat’d) 


(18)  CO^  Report 


<7.5 

>=7.5 

0.80591 

0.19149 

0.81530 

0.18470 

(19)  X-ray  Report 

Normal  Oligaemic 

Plethoric 

Grd/Glass 

Asy/Pathcy 

0.18711 

0.11816 

0.19252 

0.24481 

0.25740 

0.18710 

0.11920 

0.19558 

0.24490 

0.25300 

(20)  Grunting 
Yes 

Hearted? 

0.35788 

0.64212 

0.35650 

0.64350 
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graphical  model  for  diagnosis  of  heart  disease  in  neonates. 


Method 


Hypoxia  in  02 

Gibbs  Sampler  . Imp.  Sampling 


figure  fa  Hypoxia  in  Oj.  Plot  of  marginal  density  estimates  using  Gibbs  and  importance 
sampling,  m  * 1000  replications. 
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figtrt  4b.  C02.  Plot  of  marginal  density  estimates  using  Gibbs  and  importance  tempting, 
m  « 1000  rqdicationi. 
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figure  4c.  EUQ  Oj.  Plot  of  marginal  density  estimates  using  Gibbs  and  importance 
•empUag,  m  « 1000  indications. 
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figure  4i.  Lover  Body  02-  Plot  of  marginal  density  estimates  osin*.  Gibbs  and  importance 
sampling,  m  =  1000  replications. 
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ABSTRACT 


In  formulating  models  for  &  complex  system  graphical  representation  is  an  effective 
tool.  When  the  components  of  the  system  are  viewed  as  random  variables,  directed 
graphical  modds  detail  the  nature  of  the  dependence  among  them.  Moreover,  if  for  each 
variable  the  conditional  distribution  is  provided  according  to  the  graph,  the  joint 
distribution  is  uniquely  determined.  Natural  questions  arise  about  the  static  behavior  of 
the  system  under  such  specification  as  well  as  its  response  to  information  (observed  levels 
of  some  of  the  variables).  Answers  to  these  questions  require  the  ability  to  calculate 
arbitrary  marginal  and  conditional  distributions.  In  complex  cases  (high  dimensional 
structures)  such  calculations  require  high  dimensional  integrations  and/or  summations. 
Most  of  the  work  to  date  has  taken  advantage  of  properties  of  directed  graphs  to  facilitate 
exact  calculations  but  is  limited  with  regard  to  distributional  assumptions  and  feasible 
system  size.  Monte  Carlo  methods  for  such  calculations  can  accommodate  much  larger 
system  size  with  arbitrary  dependence  structure  and  distributional  forms  yielding 
approximations  which  can  be  as  accurate  as  desired.  It  is  the  objective  of  this  paper  to 
detail  such  methodology.  An  illustration  is  provided  using  a  diagnostic  system  for 
congenital  heart  disease  in  neonates. 


